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ABSTRACT:  In  June  2011,  the  Engineer  Research  and  Development  Center  started  a  research  project  aimed  at 
understanding  the  connection  bebA’een  Civil  Affairs  (CA)  tasks  and  the  sociocultural  information  needed  to  support  the 
conduct  of  those  tasks.  The  project,  called  CALICO  (Civil  Affairs  Language  In  forming  Cultural  Operations),  used 
manual  and  automated  content-analysis  techniques  to  arrive  at  an  understanding  of  what  Army  doctrine  (as  represented 
in  CA  training  manuals)  reveals  about  the  connection  between  CA  tasks  and  supporting  sociocultural  knowledge.  One 
motivation  for  the  work  is  the  need  to  define  CA  tasks  in  such  a  way  that  they  can  be  represented  in  a  Battle 
Management  Language  (BML)  that  can  support  the  production  and  interpretation  of  digital  operations  orders  (DOOs). 
BMLs  are  at  present  able  to  support  DOOs  that  include  tasks  in  traditional,  kinetic  warfare  (e.g.,  ATTACK,  DEFEAT, 
TAKE,  HOLD),  but  research  is  needed  on  the  incorporation  of  non-kinetic  tasks.  The  incorporation  of  non-kinetic  tasks 
into  a  BML  would  enable  DOOs  to  be  constructed,  issued,  and  interpreted  for  Phase  0  (Shaping)  operations  and 
Humanitarian  Assistance  and  Disaster  Relief  (HA/DR)  operations.  Previous  research  demonstrated  that  non-kinetic 
tasks  can  be  represented  in  ways  consistent  with  BML  architectures  based  on  Lexical  Functional  Grammar,  but  research 
has  not  extended  to  the  question  of  what  sociocultural  information  is  relevant  to  what  task.  CALICO  analysis  examines 
CA  training  texts  tagged  using  a  schema  that  emerged  from  the  texts  themselves  rather  than  a  taxonomy  developed 
independently  for  another  purpose  (such  as  JC3IEDM).  This  paper  describes  the  tagging  effort  as  part  o  f  connecting 
specific  non-kinetic  tasks  to  the  sociocultural  knowledge  that  supports  their  execution. 


1.  Introduction 

The  experiences  of  the  last  twelve  years  in  Iraq  and 
Afghanistan  have  taught  the  U.S.  Army,  and  indeed,  the 
Department  of  Defense  as  a  whole,  the  high  cost  in  blood 
and  treasure  of  failure  to  achieve  a  deep  understanding  of 
the  populations  whom  our  operations  seek  to  support  in 
pursuit  of  our  national  security  interests  (see  e.g., 
reference  [1]). 

In  order  to  be  useful  in  automated  decision  support 
environments  (e.g.,  command  and  control  [C2])  and 
modeling  and  simulation  (M&S),  the  military’s 
operational  tasks  must  be  describable  in  a  language  with 
maximal  specificity  and  minimal  ambiguity.  One  such 
language  is  battle  management  language  (BML). 
Research  has  determined  that  kinetic  tasks  (such  as  “take” 
and  “hold”)  can  be  described  and  communicated  in  a 
BML.  Because  the  military  has  been  conducting  well 


defined  force-on-force  warfare  for  centuries,  there  is  a 
shared  model  that  has  been  tested  and  refined  and  that  is 
expressed  step-by-step  in  field  manuals  (FMs)  and  has 
been  made  routine  in  training  exercises.  Research  has 
established  that  the  fundamental  rules  of  kinetic  actions 
can  be  represented  in  a  grammar  (see  references  [2] 
through  [6]).  The  lexicon  for  kinetic  tasks  is  well 
articulated.  It  has  also  proven  possible  to  extend  the  range 
of  BML  to  encompass  geospatial  information  (see 
reference  [7]). 

What  has  not  been  well  articulated  in  military  terms  is 
non-kinetic  tasks.  Doctrine  describes  in  general  terms 
what  should  be  accomplished  in  Civil  Affairs  Operations 
(CAO)  in  accordance  with  the  views  of  subject  matter 
experts  (SMEs).  FM  3-57,  Civil  Affairs  Operations 
(reference  [8]),  provides  a  foundation  for  defining  the 
tasks,  objectives,  and  targets  of  CAO,  and  the  cultural  and 
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social  domains  relevant  to  CAO  have  been  identified  on 
the  basis  of  experience. 

To  describe  CAO  in  a  BML,  one  must  first  establish  the 
variety  and  nature  of  CA  tasks  empirically.  In  other 
words,  one  must  answer  the  questions:  “What  is  the  scope 
of  what  we  want  to  describe,  in  terms  of  tasks,  objectives, 
and  targets?”  In  simpler  terms,  “Who  needs  to  do  what 
and  with/for  whom  to  accomplish  the  CA  mission?”  The 
foundation  for  the  answers  to  these  questions  is  laid  in 
doctrine  that  specifically  addresses  CA  tasks:  FM  3-57 
and  FM  3-05.401,  Civil  Affairs  Tactics,  Techniques,  and 
Procedures  (reference  [9]).  What  is  needed  in  addition  to 
an  understanding  of  CA  tasks  is  a  way  to  capture  and 
represent  sociocultural  information  that  is  firmly 
grounded  in  what  the  Army  already  says  it  is  training  its 
Soldiers  to  do. 

The  Army’s  doctrinal  approach  to  sociocultural 
information  is  encyclopedic  (see  especially  Appendix  A 
to  reference  [9]).  It  is  not  possible  to  manage 
encyclopedic  information  either  in  a  decision  support 
system  or  in  an  M&S  environment.  Thus,  it  is  crucial  to 
begin  to  establish  what  portion  of  the  universe  of 
available  information  is  actually  relevant  to  a  given  task. 
The  content  analysis  of  CA  texts  addresses  the  question  of 
relevance  by  discovering  what  sociocultural  information 
is  related  to  what  CA  tasks. 

2.  The  corpus 

Previous  research  (see  reference  [10])  led  to  the 
conclusion  that  CA  FMs  do  not  themselves  include 
enough  detail  to  delimit  specific  tasks  and  connect  each  of 
those  tasks  to  the  agents  who  perform  them.  Instead,  it 
was  proposed  that  Army  training  texts  make  better  targets 
for  content  analysis  because  they  specify  the  tasks  the 
Army  says  its  CA  Soldiers  need  to  accomplish  and 
specify  the  agents  it  trains  to  accomplish  those  tasks. 
Content  analysis  will  expose  patterns  of  tasks,  agents,  and 
allied  sociocultural  information  in  Soldier’s  Manual  and 
Training  Guide,  MOS  38B,  Civil  Affairs  Soldier,  Skill 
Levels  1  through  4  (reference  [11],  referred  to  hereinafter 
as  ‘SMTG’),  with  a  view  to  understanding  what  actors 
need  what  information  for  what  tasks. 

The  SMTG  is  foundational  to  CALICO’S  analytical 
process;  indeed,  it  is  normative  for  that  process  in  the 


sense  that  it  both  provides  evidence  for  what  needs  to  be 
coded  and  is  the  basis  for  argumentation  concerning  how 
it  ought  to  be  coded.  Although  FM  3-57  is  occasionally 
consulted  for  help  in  resolving  problems  presented  by  the 
text,  the  SMTG  is  the  principle  guide  and  arbiter  for  the 
coding  schema.  Unfortunately,  the  SMTG  is  an  FOUO 
document  the  contents  of  which  cannot  be  shared  outside 
USG  and  contractors  to  USG.  That  fact  constrains  what 
information  can  be  presented  in  this  forum,  but  we  hope 
to  provide  enough  information  here  to  raise  awareness 
and  stimulate  discussion. 

3.  The  CALICO  coding  schema 

The  coding  schema  is  a  set  of  tags  (applied  to  words  or 
phrases  in  the  CALICO  corpus)  that  label  the  concepts 
represented  in  a  word  or  phrase.  The  schema  consists  of 
four  components:  entity  tags,  descriptor  tags,  verb  tags, 
and  culture  tags.  Verb  tags  went  largely  unused  in  the 
project  and  will  not  be  discussed  in  this  paper.  Other  tag 
types  are  described  below.  The  schema  structure  is  based 
loosely  upon  WordNet  (see  reference  [12])  hierarchies 
and  relations,  but  most  components  were  developed 
inductively.  CALICO  project  members  sampled  the 
corpus  and  developed  tags  designed  to  optimize  coverage 
of  concepts  in  the  corpus  while  keeping  the  overall 
schema  simple. 

Tag  structure  is  hierarchical,  so  that  each  tag  can  be 
extended  to  a  greater  degree  of  specificity.  All  entity  tags 
are,  in  fact,  extensions  of  the  ‘entity’  parent;  for  example, 
‘#entity/e vents’  is  the  tag  for  all  entities  that  are  events. 
The  CALICO  project  exploited  this  feature  only 
conservatively.  The  only  tags  so  extended  are  entity  tags, 
and  only  the  ‘#entity/agents’  tag  has  been  extended 
beyond  the  base  entity  type. 

3.1  Entity  tags 

Entity  tags  describe  ‘is-a’  relationships;  for  example, 
Washington,  D.C.  ‘is-a’  place,  and  the  president  ‘is-a’ 
agent.  Every  noun  must  have  at  least  one  entity  tag, 
because  everything  ‘is-a’  something.  CALICO  currently 
employs  14  entity  tags: 

•  agents 

•  agents/us 

•  agents/us/m 
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•  agents/us/m/ca 

•  events 

•  info 

•  materials 

•  organizations 

•  physic  alinfrastructures 

•  places 

•  services 

•  socialinfrastructures 

•  technical_capabilities 

•  time. 

As  noted,  these  tags  are  loosely  based  on  the  WordNet 
noun  hierarchy,  wherein  all  nouns  are  ultimately 
‘entities.’  We  selected  entities  for  the  schema  based  upon 
the  concepts  observed  in  the  SMTG.  Most  of  these  are 
self-evident;  for  example,  places,  times,  events, 
information,  and  materials  are  all  central  to  military 
operations.  Other  entities  are  more  complex  or  subtle;  for 
example,  social  infrastructure,  physical  infrastructure,  and 
technical  capabilities.  These  are  all  entities  whose  need 
and  function  may  not  be  intuitively  obvious  but  ultimately 
seemed  indispensible. 

3.2  Descriptor  tags 

Descriptor  tags  describe  or  qualify  entities;  for  example 
Washington,  D.C.  is-a  ‘political’  place,  and  the  president 
is-a  ‘political’  agent.  Descriptor  tags  are  analogous  to 
adjectives  in  natural  language.  Entities  may  have  as  many 
descriptor  tags  as  needed,  and  they  may  have  no 
descriptors  if  that  is  most  appropriate.  CALICO  currently 
features  45  descriptor  tags.  The  set  of  descriptor  tags  is 
the  most  malleable  in  the  coding  schema,  and  CALICO 
actively  added,  deleted,  or  merged  these  tags  throughout 
the  course  of  code  application  and  analysis.  Descriptor 
tag  usage  patterns  are  also  more  varied  than  those  of 
entities  or  culture  tags.  Some  descriptors  are  common, 
such  as  ‘#civilian,’  ‘#military,’  ‘#public,’  and  ‘#private,’ 
while  others  are  so  narrowly  defined  they  have  only 
limited  application,  such  as  ^transition,’  ‘#extremist,’  or 
‘#language.’ 

3.3  Culture  tags 

Culture  tags  are  used  to  mark  whose  culture  is  referred  to 
in  a  phrase.  The  options  in  CALICO’S  context  are  ‘#us’ 
(i.e.,  United  States),  ‘#hn’  (i.e.,  host  nation  [HN]),  or 


‘#non’  (i.e.,  no  particular  cultural  reference,  ‘other’). 
Even  in  a  document  such  as  the  CA  SMTG,  which  is  used 
in  the  training  of  the  Soldiers  whose  specialty  is 
interaction  with  the  local  populace,  most  of  the  phrases  in 
fact  refer  to  U.S.  culture.  All  descriptions  of  Army 
organizations.  Army  bureaucratic  processes,  and  Army 
protocol  are  examples  of  U.S.  culture.  HN  culture  could 
include  these  same  entity  types,  but  it  also  includes 
descriptions  of  HN  geography,  politics,  culture,  and  civil 
or  municipal  systems,  for  example.  The  ‘no-culture’  or 
‘other’  tag  ‘#non’  is  not  so  much  non-cultural  as  it  is 
ambiguously  cultural.  This  tag  is  reserved  for  entities  that 
have  broad  cultural  associations,  such  as  inter¬ 
governmental  organizations  (IGOs).  The  culture  tags  help 
CALICO  analysts  mark  who  is  being  talked  about,  U.S.  or 
HN,  and  to  retrieve  for  analysis  only  those  phrases 
explicitly  tagged  with  ‘#hn.’  That  leads  to  CALICO’S 
operational  definition  of  ‘sociocultural  information,’ 
namely,  any  word  or  phrase  to  which  some  entity  tag  was 
applied  (typically  along  with  one  or  more  descriptors) 
with  ‘#hn.’ 

3.4  Tagging  and  its  limitations 

We  determined  to  tag  the  CA  texts  with  a  set  of 
controlled,  carefully  vetted  tags  for  two  reasons.  First, 
BML  requires  a  simple,  austere  representation  of  tasks 
and  information.  The  CALICO  tags  are  abstractions  or 
generalizations  of  sociocultural  information  that  can  be 
used  consistently  and  unambiguously  to  represent  that 
information.  Second,  the  tags  allow  one  quickly,  if 
coarsely,  to  summarize  the  data  in  the  corpus  for 
statistical  analysis.  This  allows  one  to  understand  broadly 
what  types  of  information  the  Army  finds  most  relevant 
and  to  expose  relationships  between  types  of  information. 
For  example,  the  Army  might  find  information  about 
civilians  particularly  relevant,  especially  information 
about  social  infrastructure  in  the  area  of  operations. 

One  of  the  limitations  to  using  a  tagging  schema  is  that, 
although  it  is  consistent  and  unambiguous,  the  resolution 
of  the  representations  may  be  insufficient.  One  can  tag 
many  things  as  ‘social  infrastructure,’  for  example,  but 
the  tagging  system  may  not  possess  the  tags  needed  to 
represent  the  nuances  or  granularity  that  may  also  be 
important.  We  can  add  qualifier  tags,  such  as  ‘civilian 
social  infrastructure.’  This  does  solve  some  of  the 
resolution  problem,  but  there  could  always  be  additional. 
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important,  information  left  uncoded.  One  could  keep 
adding  new  tags  to  capture  any  relevant  information,  but 
at  some  point  the  tagging  system  will  become  more 
burdensome  than  using  natural  language,  and  the  goals  of 
simplicity  and  austerity  are  defeated.  The  task  is  to  strike 
the  right  balance  between  parsimony  and  completeness.  A 
related  limitation  is  consistency  of  tagging.  Especially  as 
the  tagging  system  becomes  more  complicated,  it  is  more 
difficult  to  ensure  tags  are  used  consistently.  A  code 
book  might  be  developed  to  help  coders  apply  the  tags 
correctly,  but  with  more  tags  and  more  concepts  to  be 
tagged,  the  chance  of  human  error  increases.  It  is  likely 
that  semi-automated  tagging  could  ameliorate  this 
situation  somewhat. 

4.  Army  Tasks 

An  Army  task  is  “a  clearly  defined  and  measurable 
activity  accomplished  by  individuals  and  organizations.  It 
is  the  lowest  behavioral  level  in  a  job  or  unit  that  is 
performed  for  its  own  sake.  It  must  be  specific;  it  has  a 
definite  beginning  and  ending;  may  support  or  be 
supported  by  other  tasks;  has  only  one  action  and; 
therefore,  is  described  using  only  one  verb;  a  task  is 
performed  in  a  relatively  short  time;  and  it  must  be 
observable  and  measurable”  (reference  [11],  p.  1-4). 

An  example  from  the  Army  Universal  Task  List 
(reference  [13],  p.  2-27,  the  distribution  of  which  is  not 
limited  as  the  SMTG’s  is)  is  “Conduct  Civil  Support 
Operations.”  Each  such  task  represented  in  the  CA 
training  manuals  is  constructed  of  a  number  of  parts,  only 
one  of  which  is  important  here,  namely  what  are  called 
“performance  steps.”  Performance  steps  are  lists  (ordered 
either  sequentially  or  logically)  of  the  individual  activities 
that  lead  to  completion  of  a  task.  Each  is  presented  as  a 
command: 

•  Determine  the  purpose  of  X 

•  Establish  the  number  of  Y’s  in  the  area  of 
operations. 

•  Conduct  an  assessment  of  Z 

•  Identify  the  key  B’s  in  the  operational 
environment 

•  Assess  the  condition  of  C 

•  Collect  information  related  to  Q 

•  Document  the  condition  of  the  local  R’s. 


CALICO’S  unit  of  analysis  was  the  task,  and  its  interest 
was  in  the  sociocultural  information  that  accompanied  the 
performance  steps  within  each  analyzed  task. 

5.  CALICO  Tagging 

CALICO  focused  on  tagging  and  analyzing  the  objects  of 
the  verbs  in  the  performance  steps  that  make  up  certain 
CA  tasks.  Since  these  are  Civil  Affairs  tasks  (rather  than 
Maneuvers  tasks  or  Fires  tasks,  for  example),  one  might 
expect  a  focus  on  the  civil  component  of  the  operational 
environment.  Indeed,  the  SMTG  includes  a  large  number 
of  different  phrases  that  are  used  to  refer  to  the  civilian 
populace: 

•  Civil  component 

•  Civil  society 

•  Civil  population 

•  Civilians 

•  Civilian  populace 

•  Civilian  population 

•  Local  civilians 

•  Local  civilian  population 

•  Local  civilian  populace 

•  Local  individuals 

•  Local  nationals 

•  Nationals 

•  Their  own  people  (i.e.,  persons  subject  to  HN 
authorities) 

•  Children 

•  Non-military  personnel 

•  Noncombatants. 

The  string  ‘#entity/agents  #civilian  #hn’  would  be  applied 
to  each  of  these  phrases  or  words.  That  same  string  can 
then  be  used  as  the  starting  point  for  distinguishing  types 
of  civilians  within  the  populace  as  a  whole  by  adding 
tag(s): 

•  #entity/agents  #civilian  #dislocated  #hn  #non  - 
refugees 

•  #entity/agents  #civilian  #licit  #hn  -  citizens  of 
the  HN 

•  #entity/agents  #civilian  #non  -  third-country 
nationals 

•  #entity/agents  #civilian  #communication  #hn 
#non  #us  -  journalists 
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Here  are  examples  of  how  the  tagging  schema  was 
applied: 

•  If  a  performance  step  required  identifying  the 
people  employed  in  the  HN  public 
administration,  that  object  phrase  would  be 
tagged 

o  #entity/agents  #administrative  #public 
#hn 

•  If  a  performance  step  required  identification  of 
civilian  organizations  in  the  area  of  operations, 
the  following  string  would  be  applied  to  that 
phrase: 

o  #entity/organizations  #civilian  #hn 

•  If  a  performance  step  were  to  mandate  the 
development  of  knowledge  concerning  the  times 
at  which  agricultural  activities  occur,  that  phrase 
would  receive  the  tag: 

o  #entity/events  #entity/time  #agriculture 
#hn 

•  If  a  performance  step  were  to  express  an  interest 
in  the  civilians  in  the  area  who  are  employed, 
that  phrase  would  receive  the  string 

o  #entity/agents  #civilian  #economy  #hn 

•  If  a  performance  step  were  to  express  a  concern 
with  hospitals  or  clinics  available  to  civilians, 
that  phrase  would  be  tagged 

o  #entity/physical_infrastructures 
#civilian  #health  #hn. 

Another  way  to  conceptualize  something  of  the  nature  of 
the  CALICO  coding  schema  is  to  consider  the  categories 
under  which  CA  Soldiers  organize  the  information  they 
collect.  CA  Soldiers  use  the  acronym  ‘ASCOPE’  as  a 
mnemonic  device  for  these  categories:  Areas,  Structures, 
Capabilities,  Organizations,  People,  Events.  Since 
CALICO’S  coding  schema  emerged  from  the  needs 
presented  by  the  SMTG,  it  is  not  surprising  that  there 
should  be  a  close  correspondence  between  the  entity  tags 
and  the  ASCOPE  categories: 

•  Areas  -  #entity/places 

•  Structures  -  #entity/physical_infrastructures 

•  Capabilities  -  #entity/technical_capabilities 

•  Organizations  -  #entity/organizations 

•  People  -  #entity/agents; 
#entity/social_infrastructures 

•  Events  -  #entity/events. 


CALICO’S  entity  tags  align  well  with  the  ASCOPE 
categories;  the  main  enhancement  the  schema  provides  is 
the  addition  of  the  ‘#entity/social_infrastructures’  tag, 
which  moves  both  the  ‘Organizations’  category  and  the 
‘People’  category  beyond  mere  enumeration  of  human 
actors  to  the  social  and  cultural  structures  that  shape 
human  actions.  Thus,  it  becomes  clear  that  the  CALICO 
coding  schema  offers  the  possibility  of  representing 
sociocultural  information  in  a  computerized  information 
management  system. 

6.  The  Representation  of  Sociocultural 
Information 

CALICO’S  representations  of  sociocultural  information 
are  strings  of  tags  that  consist  of  no  fewer  than  the 
following  constituents,  in  the  following  order: 

•  At  least  one  entity  tag 

•  Descriptor  tags,  as  appropriate 

•  One  or  more  of  the  following,  as  appropriate: 
‘#hn,’  ‘#non,’  ‘#us.’ 

By  convention,  the  elements  within  each  tag  type  are 
presented  in  alphabetical  order. 

The  earlier  presentation  of  sample  tag  strings  is 
suggestive  of  the  kinds  of  contrasts  that  can  be  drawn 
using  the  schema.  This  type  of  partially  hierarchical 
representation  would  be  subject  to  query  in  a 
computerized  system.  Indeed,  both  the  analysis  of  the 
sociocultural  content  of  the  tags  as  applied  in  the  SMTG 
and  the  analysis  of  relevance  were  based  on  queries  run 
against  the  annotated  corpus.  The  coding  schema  allows 
the  development  of  queries  that  are  quite  broad  and  of 
others  that  are  rather  more  granular,  at  the  same  time  as 
vocabulary  differences  within  and  between  documents  are 
leveled  by  the  use  of  tags  instead  of  free-text  searches. 
Such  an  approach  would  entail  overhead  in  terms  of 
tagging  the  information  against  which  queries  would  be 
run  as  part  of  an  automated  decision  support  system  The 
CALICO  project  also  demonstrated  (see  reference  [14]) 
the  very  real  possibility  that  tagging  can  be  semi- 
automated  (i.e.,  with  a  human  in  the  loop)  and  thus  lower 
that  overhead. 

Although  some  of  the  CALICO  tags  are  necessarily  used 
more  than  others  within  individual  CA  tasks,  none  of  the 


Approved  for  public  release;  distribution  is  unlimited. 


most  highly  relevant  tags  leads  to  fine  sociocultural  detail. 
Tags  instead  represent  categories  of  information  that 
could  be  attached  to  individual  details  for  information 
retrieval  purposes.  Their  multifarious  combinations  allow 
for  binning  of  information  and  the  construction  of 
searches  that  would  allow  those  bins  to  be  constructed 
and  deconstructed  depending  on  the  needs  of  the  moment. 

Content  analysis  using  CALICO’S  working  definition  of 
‘sociocultural  information’  (as  a  combination  of  entity 
tags  +  descriptor  tag[s]  +  ‘#hn’)  leads  to  the  conclusion 
that  there  is  a  great  deal  of  sociocultural  information  in 
the  corpus.  The  CALICO  entity  tags  for  the  most  part 
name  the  ASCOPE  categories,  and  thus  those  entity  tags 
provide  a  framework  of  categories  nicely  consistent  with 
CA  doctrine.  The  descriptor  tags  then  permit  sub- 
classification  within  those  broad  categories.  The 
representation  of  sociocultural  information,  therefore,  is 
by  named  bins,  where  the  bins  are  constructed  on  the  fly 
out  of  tag  strings.  Those  tag  strings  are  surrogates  for, 
i.e.,  representations  of,  categories  of  sociocultural  infor¬ 
mation. 

7.  Relevance-to-Task  in  CALICO 

Relevance-to-task  for  CALICO  is  essentially  a  statistical 
property,  based  on  frequency  of  occurrence  of  descriptor 
tags  within  a  task.  As  text,  each  task  is  treated  as  a  bag  of 
words  in  which  each  bit  of  sociocultural  information  is 
potentially  relevant  to  (and  thus  equally  relevant  to)  the 
task  in  question.  It  is  therefore  necessary  to  determine  not 
what  information  is  relevant  to  the  task  but  instead  what 
information  is  most  relevant  to  the  task. 

For  each  task,  CALICO  took  the  following  approach: 

•  The  descriptor  tags  present  in  the  task  and  the 
number  of  times  each  is  applied  were  considered. 

•  The  top  10%  (by  absolute  frequency  of 
occurrence  of  the  ‘#hn’  tag)  of  all  descriptors 
applied  in  each  task  was  identified 

•  All  the  applications  of  the  most  used  tags  in  each 
task  were  inspected,  which  enables  the  identi¬ 
fication  of  the  associated  bits  of  sociocultural 
information  in  that  task 

•  From  those  associated  bits  of  sociocultural 
information,  key  topics  in  the  task  were  iden¬ 
tified. 


•  Inferential  statistics  were  used  to  estimate 
whether  the  tags  our  analysis  deemed 

particularly  relevant  for  a  given  task  in  the 
SMTG  are  likely  to  remain  relevant  in  actual 
performance  of  the  task. 

This  results  in  a  view  of  relevance  based  solely  on 
statistics  related  to  the  frequency  with  which  tags  occur 
inside  a  task.  That  information  can  be  clustered,  based 
either  on  similarity  of  content  or  on  co-occurring  entity 
tags. 

What  CALICO  reveals  in  terms  of  relevance-to-task  is  the 
relative  importance  of  sociocultural  information. 
CALICO  cannot  answer  the  question  “What  sociocultural 
information  is  relevant  to  a  task?”  in  granular  detail. 
Instead,  it  answers  the  question  “What  categories  of 
sociocultural  information  occur  most  frequently  within  a 
task?”  There  are  two  reasons  that  the  first  question  is 
unanswerable.  First,  detailed,  granular  information  is 
simply  not  a  part  of  the  SMTG.  Only  categories  of 
information  are  found  in  the  text,  perhaps  with  an 
example  or  two  of  what  might  be  included  within  a  given 
category.  These  categories  of  information  are  captured 
and  represented  by  CALICO  tag  strings.  Secondly,  the 
view  of  relevance  that  tag-based  content  analysis  of  a  text 
supports  is  essentially  a  statistical  property  of  categories 
of  information  within  a  task. 

CALICO’S  view  of  relevance  is  static  because  the  text  of 
the  SMTG  is  static  until  a  CA  Soldier  begins  to  conduct  a 
task  in  a  specific  context.  As  noted  above,  each  bit  of 
sociocultural  information  in  an  SMTG  task  is  potentially 
relevant  to  (and  thus  equally  relevant  to)  the  task.  These 
sociocultural  elements  in  a  task  can  be  aggregated  by  the 
content  analyst  into  topics.  Some  of  these  topics  can  then 
be  characterized  as  more  relevant  than  others,  given  some 
statistical  cut-off.  This  state  of  affairs  is  a  function  of  the 
nature  of  the  text  itself:  The  SMTG  cannot  include  more 
than  categories  and  examples,  because  precise 
sociocultural  detail  is  profoundly  context-specific. 

Another  question  that  might  arise  is  this:  “To  what  extent 
does  the  sociocultural  information  in  the  text  contribute  to 
an  understanding  of  what  is  really  important?”  The 
information  that  is  collected  and  organized  using  the 
CALICO  categories  needs  to  be  situated  in  a  framework 
that  allows  the  significance  of  the  information  to  be 
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articulated.  It  is  not  enough  to  collect  information,  as  CA 
Soldiers  know  well;  what  is  collected  must  be  analyzed, 
must  be  interpreted,  must  be  understood.  But  neither  the 
SMTG  itself  nor  the  wider  Army  doctrinal  context  in 
which  the  SMTG  is  embedded  provides  a  robust 
framework  within  which  that  understanding  can  be 
developed.  Instead,  doctrine  provides  organizational 
schemata  (e.g.,  PMES1I  and  ASCOPE).  Information 
organized  is  one  thing;  information  understood  is  another. 
CALICO’S  parent  project  CREATE  (Cultural  Reasoning 
and  Ethnographic  Analysis  for  the  Tactical  Environment) 
undertakes  the  development  of  frameworks  and  tools  that 
make  the  knowledge  from  the  social  sciences  available  to 
analysts  for  presentation  to  decision  makers  (see  reference 
[15]). 
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